Automatic discovery of sub-molecular sequence domains in multi-aligned sequences: a dynamic programming algorithm for multiple alignment segmentation.

نویسندگان

  • E P Xing
  • D M Wolf
  • I Dubchak
  • S Spengler
  • M Zorn
  • I Muchnik
  • C Kulikowski
چکیده

Automatic identification of sub-structures in multi-aligned sequences is of great importance for effective and objective structural/functional domain annotation, phylogenetic treeing and other molecular analyses. We present a segmentation algorithm that optimally partitions a given multi-alignment into a set of potentially biologically significant blocks, or segments. This algorithm applies dynamic programming and progressive optimization to the statistical profile of a multi-alignment in order to optimally demarcate relatively homogenous sub-regions. Using this algorithm, a large multi-alignment of eukaryotic 16S rRNA was analyzed. Three types of sequence patterns were identified automatically and efficiently: shared conserved domain; shared variable motif; and rare signature sequence. Results were consistent with the patterns identified through independent phylogenetic and structural approaches. This algorithm facilitates the automation of sequence-based molecular structural and evolutionary analyses through statistical modeling and high performance computation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Application of the ABS LX Algorithm to Multiple Sequence Alignment

We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...

متن کامل

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons.

An algorithm is presented for the multiple alignment of protein sequences that is both accurate and rapid computationally. The approach is based on the conventional dynamic-programming method of pairwise alignment. Initially, two sequences are aligned, then the third sequence is aligned against the alignment of both sequences one and two. Similarly, the fourth sequence is aligned against one, t...

متن کامل

Analysis of Ribosomal RNA Sequences by Combinatorial Clustering

We present an analysis of multi-aligned eukaryotic and procaryotic small subunit rRNA sequences using a novel segmentation and clustering procedure capable of extracting subsets of sequences that share common sequence features. This procedure consists of: i) segmentation of aligned sequences using a dynamic programming procedure, and subsequent identification of likely conserved segments; ii) f...

متن کامل

Biogeometry Research Faster Multiple Sequence Alignment Algorithms Based on Pairwise Segmentation

Multiple Sequence Alignment (MSA) is a central problem in computational molecular biology --it identifies and quantifies similarities among several protein or DNA sequences.The well-known dynamic programming (DP) algorithms align k sequences (each of length n) by constructing a k-dimensional grid graph of size O(nk), with each of the sequences enumerating one of the dimensions of the grid. The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of theoretical biology

دوره 212 2  شماره 

صفحات  -

تاریخ انتشار 2001